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(54) Text/image selection from document images 

(57) A method carried out in an image processing 
system in which images of documents are captured by 
an image capture device, such as a video camera, com- 
prising: (a) displaying successive images captured by 
the video camera, each image being defined by greys- 
cale image data and containing text matter (b) receiving 
a first user input (mouse button click) defining the start 
of a selection and a first position within the displayed 
image, (c) in response to the first user input, freezing 
the displayed image, (d) determining the skew angle of 
text matter with respect to the field of view of the video 
camera, (e) receiving at least one further user input (fur- 
ther button click; drag of cursor), including a final user 
input (mouse button release), defining the end of a se- 
lection, and for the or each further user input, (f) deter- 
mining, using the skew angle determined in step (d), the 
position, shape and dimensions of a selection element 
in dependence upon at least said first position, and (g) 
displaying the selection element superimposed on said 
frozen displayed image. The selection element may be 
a rectangle, or a selection vlock highlighting one or more 
words of text. 
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Description 

The present invention relates to image processing, and more particularly relates to techniques providing text/image 
selection trom document images. a y 

Conventional word processor applications tor the personal computer enable a user to select text or image portions 
wrthin a document, corresponding to an electronically stored file, by means of button presses and dragging of a mouse 
cursor. 

—Z^T" >S u Uile , dif,eren ' Wh6n ' he dis P' a y ed document is that captured by a document camera providing 
grayscale, and usually relatively low resolution, images, such as those employed in over-the-desk scann.ng systems 
It is known to use m such over-the-desk scanning systems a video camera disposed above a desk and capturing 
images of documents which are displayed to a user on a CRT monitor or other display devk=e: these are discussed in 
detail, for example, m EP-A-622,722 (applicants' reference R/93003K/JDR) and Bntish patent application 9614694 9 
applicants reference R/96007/JDR). The capture of the document images may be for display in situ, or for transmission 
to a remote location as part of a videoconferencing tool. 

in.o^T 6 ?'' 8 Pr °. 6lem encoun,ered wi,h such s V s 'ems is how to provide a very efficient text selection interlace for 
interactive lace-up document camera scanning applications. There is a need for techniques supporting the selection 

r ?'T 0, J eXt and ima9eS Wi ' hin 3 Cap,ured by ,he camera v,a a "click-and-drag" ol the mouse defining 
wo points, or a leading diagonal, and for techniques providing, in much the same way as a word processor interface 
tor single and multi-word text selection from such a document. 

The present invention provides a method carried out in an image processing system in which images ol documents 
are captured by an .mage capture device, comprising: (a) displaying successive images captured by the image capture 
device, each image being defined by greyscale image data and containing text matter, (b) receiving a firs, user input 
defining he start of a selection and a firs, position within the displayed image, (c) in response to the first user input 
freezing the displayed image, (d) determining the skew angle ol text matter with respect to the field of view of the imaqe 
capture device, (e) receiving at least one further user input, including a final user input defining the end of a selection 
and for the or each further user input, (1) determining, using the skew angle determined in slep (d), the position shape 

lmJrr 0nS 3 S !' eC ' i0n J f' emenl de P endence U P°" at ,east *»id first position, and (g) displaying the selection 
element superimposed on said frozen displayed image. 

The method preferably further comprises the step of: (h) extracting the image from within the selection element 

a „n.n«r ^ preferabl V fur,her emprises the step of: (I) rotating the extracted image through the determined skew 
angle (t*), in ine opposite sense. 

o,„ .h he ir T n .i i0 ? fUr,h ? r Pr ° VideS 3 P r °9 rammable ima 9* Processing system when suitably programmed for carrying 
out the method ol any of the appended claims or according to any of the particular embodiments described herein the 
S!?1 C 9 3 pr ° cesSor ' and a memo, y- an ima 9e capture device, an image display device and a user input 
device, the processor being coupled to the memory, image capture device, image display device and user input device 
and being operable ,n conjunction therewith lor executing instructions corresponding to the steps of said method(s) ' 
fr. JH ""I US6r emP '° yS 3 " c,ick - and " dra 9" of the mouse defining two points, the remaining degree of 

,h 3 reC,an9 ' e in ,he ima9e iS the SkGW ° f the ,ext The inven,ion em P'°ys skew angle detection 

techniques to this document camera case where the location of -skew-pertinent" information is supplied by the user 

™ h ^n Z 6 ', V ' n i eX, , raC,e K an in,ermedia,e ima 9 e the underlying greyscale image. The method is fast 
enough to f.nd skew wrthin less than 0.5s for most font sizes, which is fast enough to provide a pleasing interface A 
similar effect .s obtained for single-word and multi-word selection techniques s.ng , menace. A 

drawing which:°' ^ inVen,i ° n deSC " bed ' by way of example., with reference to the accompanying 



Figure 1 is view from above a desk of a document from which a text portion is to be selected in an over-the-desk 
scanning system according to an embodiment of the present invention 
Figure 2 shows the same view as in Fig. 1, after a user has finished selecting the text portion 
Figure 3 ,s a flow chart of the steps in providing text selection in accordance with an embodiment of the present 
w invention; r 

Figure 4 shows the substeps employed in implementing the skew detection step in Fig 3 
Figure 5 shows the effect of the substep in Fig. 4 of computing a high gradient image 

Figure 6 illustrates the effect of varying the size of the test image portion on the effect of the skew detection substep 

ss Figure 7(a) shows a portion of captured and displayed text from which a user makes a selection, and Figure 7(b) 

shows in magnified form, part of the text matter ol Fig. 7(a), showing a selected word 

Figure 8 ,s a flow chart showing the processing steps performed in providing the selection feedback illustrated in 
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Figure 9 shows in more detail the technique (step s14 in Fig. 8) for determining the local inter-word threshold; 
Figure 10 shows histogram is formed of horizontal gaps between the connected components of equal font size, 
(a) the ideal bimodal distribution, (b) real data with two attempted curve fittings, and (c) the fitting of a Gaussian 
curve; 

5 Figure 11 illustrates text selection by diagonal sweep in an alternative embodiment of the invention; and 

Figure 12 is a flow chart of the steps performed in providing the selection feedback shown in Fig. 11 . 

There are described below various techniques for text and/or image selection. It will be appreciated that these 
techniques may be used in conjunction with the image enhancement and thresholding techniques described in Euro- 

10 pean patent application EP-A- , based on British patent application 9711024.1 (applicants' ref: R/ 

97008/JDR), filed concurrently herewith. 

A. System configuration 

15 It will be appreciated that the techniques according to the invention may be employed in any system or application 

where selection of a text portion from a multibit-per-pixel (e.g. greyscale or colour) image is required. Such instances 
include videoconferencing systems, scanning systems, especially the aforementioned over-the-desk scanning sys- 
tems, multifunction devices, and the like. It will be appreciated that the invention may be implemented using a PC 
running Windows™, a Mac running MacOS, or a minicomputer running UNIX, which are well known in the art. For 

20 example, the PC hardware configuration is discussed in detail in The Art of Electronics, 2nd Edn, Ch. 10, P. Horowitz 
and W. Hill, Cambridge University Press, 1989. In the case of over-the-desk scanning, the invention may form part of 
the systems described in any of EP-A-495,622, EP-A-622,722, or European patent application EP-A- 

, based on British patent application 9614694.9 (applicants' reference R/96007/JDR) filed 12.7.96. 

The invention has been implemented in C++ on an IBM compatible PC running Windows® NT. 

25 

B. Rectangular text region selection via skew detection 

This section describes a text selection technique that enables rectangular text region selection. The user defines 
a leading diagonal of the rectangle with a mouse Automatic text skew detection is used to calculate the required image 

30 selection Skew recovery is made efficient by analysing the image in the neighbourhood of the mouse input. 

Figure 1 is view from above a desk of a document from which a text portion is to be selected in an over-the-desk 
scanning system incorporating an embodiment of the present invention. 

Initially, a document 2 is open on the user's desk (not shown), and the user has positioned the document 2 so that 
the paragraph 4 which he wishes to scan/copy is within the field of view 6 of the camera (not shown). Images (greyscale) 

35 of the document 2 are captured and displayed to the user as feedback. As discussed in the EP-A- 
(R/96007/JDR), the content of the field 6 may be displayed (as live video images) within a window of any suitable 
display device, such as a CRT or LCD display Using a conventional mouse, the user is able to control the cursor 
position in a familiar way; and the start of the selection of the paragraph 4 begins with the user pressing the left mouse 
button with the cursor at initial position 8. While the left mouse button remains pressed, the user makes a generally 

40 diagonal line (top left to bottom right): an intermediate cursor position 8' during this motion is shown. 

Figure 2 shows the same view as in Fig. 1 , after a user has finished selecting the text portion: end of selection by 
the user is inputted by the user releasing the left mouse button when the cursor is at the final cursor position 8". As 
can be seen, the text of document 2 is skewed with respect to the coordinate space of the camera's field of view 6: the 
angle of skew G must be determined. 

45 Figure 3 is a flow chart of the steps in providing text selection in accordance with an embodiment of the present 

invention. Initially, the start of selection user input is detected (step si). Immediately (step s2), the image (i.e. within 
the field 6) displayed to the user is from on the display device (not shown). Next, a routine (s3) is performed to determine 
the skew angle 6, as is described in further detail below. Returning to Fig. 2, once the value of 6 is obtained, the positions 
within the coordinate space of the display window of a selection rectangle 10 which is to be displayed as feedback to 

so the user must be determined; the requirement being that, to provide a pleasing interface for the user, the selection 
rectangle 10 must be at the same skew angleO. The coordinates ((x, y), (x\ y')) corresponding to the initial and current, 
respectively, cursor positions 8, 8" are known. Using simple geometric relations, the coordinates (a, b) and (c, d) of 
the other corners of the rectangle 10 can readily be calculated. The skew angleG is normally a small angle: generally 
it will be less than 5°. 

55 As shown in Fig. 3, a rectangle is formed (step s5) with (x, y), (x\ y'), (a, b) and (c, d) at the corners. This rectangle 

is then superimposed (step s6) on the stored frozen image data, and the resulting image displayed. At test is then 
made at step s7: if the user has finished selecting (i.e. an input received indicating that he has released the left mouse 
button), and if he has not, processing returns to step s4. (For illustration, the final cursor position 8" is used as the 
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If the user has finished selecting, the current image is frozen (step s8) in the display (window) Then the imaae 
data for the image (here: the paragraph 4) present within the selection rectangle 10 is exacted (step s9) ,™ tha, S 

on .h?^ s w de,ec,ion s,ep 1( ; n F,g 3 This routine is based 

iected nrnfii* of H,f. Q ^„ US-A-5,355, 420 — maximising the variance of the laterally pro- 

jected profile of d. ferences over a range of skew angles, where the rotation of the image is made efficient bv onlv 

£ JeT,™ T"" ^ f 8P S31) * i^'^ 9 3 9rab — 12 < a ™» a eat 

the right and below the cursor position such as that shown in Fig. 6 (discussed further below)) Suitably the a< a * a-ea 

1 2 ,8 lust large enough for a few lines of text, and perhaps a couple of lines of 1 0 point text 9 

In order to minimise the amount of time taken to compute skew, we attempt to analyse the smallest amount of the 

"earl Z™ is « SfT " ?**!! 1 8,WW ^ ,6SS ,han ,W ° lines has ba - 'ound) bu.The p ob em 

is clear y that ,t is not known how large the font size is before the skew angle has been determined 

To this end, an initial sample size (grab area 12) that is large enough to capture several lines of text at a "most 

hkeiy ton, s,ze of between 10-12 P , is used. Further, this initial region is to the right and betow he ^ a "click" £Z 

ssrss^* 1 " is ,he most common ,om size ,hat is m usin9 the *• ifss . 

12 Th^iZ! S ' eP , t 32 \ inVOlV6S Com P uta,ion of a hi 9" gradient image from the ,mage within the initial grab area 
r t T ™T? the dOCUmen * in Fi9 1 are 9 re y scale ima 9 es - An option is to threshold the image and men pass ^ 

zzszzszxr*™ However ' under uncon,roiied ,i9h,in9 condi,i °- - PaS^SS 

u^^SS^cS^i? SUbS !! P " R9 4 °* C ° mpU,ing 1,16 hi9h 9radient ima ^ which is accomplished 
er Verlaa 199^ TitZ T vT^'" Jahn6 B ' D V ita "™9* section6.3.2, Spring- 

o iolnaffolt , ?• !f 9 9 ima9e ° f RS 5 ' SaCh Wh " e P ixel ,s ,he result °» «ne gradient in the 

original (greysca.e) image at that point being greater than a predetermined threshold, and each black pixel is the esuU 
of the gradient in the original (greyscale) image a. that point being less than the predetermined mresho d The hiah 

umfrlr 9 " ( T 35 inPUt Sk6W de,eC,K5n) iS eaSily COm P uted " om ,he TO- uppS by^he d£ 

ument camera and is a very reliable substitute for a fully thresholded image Y 

ske W C ana P .el a ;ea S S " ex l? e,,ormed on the , ima9e da,a '<>' th * "*h gradient image for each of the allowed set of 

5 187 753 and US A 5 355 4?m f^T, °' a ' ,h ° U9h ^ SUi,ab ' e re9 ' me may be em P to V ed ~ -e US-A- 
h'Io If k US-A-S 355,420). step s33. In each case, the image is sheared (step s34) to approximate the rotation 

the l£ ?kT? US t d ' n VertiCa ' Shearin9 prOC6dure ,hat ,ies al ,he heart <* the angular searches to wrap arau^d 
the vertical shift. In other words, the pixels that are pushed out of the top of the region are re-inserted auhe bo«om in 

ra^nrmTre?e= ,heVar ^ 

rt*JZl he 9iVen T 9 '!' 3 ' a,eral p,0 ^ lion histogram for the image is computed (step s35) Based on the histoaram 
be nlnln f ° r ,he . 9,Ven an9 ' e is calcu,at ^ (step s36). A plot of variance against angle (of otat Z) 

^iaS T I" F ' 9 6<a) abHity 0 ' ,he ' echniqUe 'o determine the skew angle depends 1 1Z ?£?JZ 
initial grab area 12 relative to the font size; and the absence of a discernible peak in Fig 6(a indicates that helm 
Ptftton has been unsuccessful. A test is made a. step s37 to determine wheLr the highest peak tthe pt of skew 

the average value 9 i orTatir ^ ^ ,GSl) ' SUCh 35 by de ' ermin ' n9 Whe,her ' he ra "° °« Pe'ak val e " 
2 ^^sTcTeased steo sSfanlen Pfede,ermined Value « ,he P eak '* ™» ^nificanl. the size o, the initial grab area 
mtrt ?k i i P } " Pressing returns to step s32. The grab area 1 2 is expanded in the vertical direction 

more than the horizontal as i, is in that direction that the most skew-pertinent information lies This i *ne S an 

Ta^J il Lctd Sh °' d ° n me 0n ' hiS C3Se de ' ined '° 66 ,hS m — Variance divided b; ,hrme U a n n var 

subs F e 9 pT S Ffa b rin d , i ;?^ S,r l e ^"T °' ^ *** °' ,he 9fab af6a 12 °" ,he eHecl °< « he ^ew detection 

whlh a u 9 h « o ,he ' 0nt SiZS iS 36pt Clearl * a ^9ni«can. peak is ascertained for Fig 6(b) from 

which a skew angle of 0.35° can be derived. This shows that very little text is needed for a good skew confidence The 
grab area 12 of Fig. 6(b) is sufficient for the determination, and there is no need to expand^S ^Srger a^a "ro 6 
(c). In a preferred embodiment, the first grab area 12 is 100x100 p.xels the next largest is 200x200 ?J£Z »„h * 
next largest 300x300 pixels. If the latter fails, a value of 6=0 is returned P ' ^ 

The above description outlines the situation where text matter (paragraph 4) is sought to be selected bv the user 
Howeve, the techniques according to the invention may be used lor .he selection of graphical^ obS's within a d" u 
men,, and the aforementioned techniques have also been found to work well w„h graphics and Trie Swings 
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The algorithm described in this section is very efficient, and the delay between starting to drag out the leading 
diagonal and the skew being detected is of the order of 0.5s on standard PC hardware, and slightly longer for larger 
and less common font sizes. 

In addition, it will be appreciated that provision may be made, suitably using techniques for the resizing and moving 
5 of windows in the MS Windows environment, allowing the user to resize and/or reposition the selection rectangle 10 
after it has been formed by releasing the left mouse button. 

C. Single and multi-word selection methods 

10 This section describes a single and multi-word text selection process, the aim being to imitate a common word 

processor interface, i.e. double click selects a word and "click-and-drag" may define a non-rectangular text region. 

Figure 7(a) shows a portion 1 3 of captured and displayed text from which a user makes a selection. The selection 
may be of a single word, or of multiple consecutive words. 

In this case, the user selects, using a cursor controlled by a mouse (not shown) in the conventional manner, from 

15 a portion 13 of displayed text matter a word 14 ("compression"): the user performs a "doubie-click" with the left mouse 
button with the mouse cursor in an initial position 8. As is shown (slightly exaggerated for the sake of illustration), the 
text matter is skewed by an angle 6 with respect to the display coordinate space. Appropriate feedback must be dis- 
played to the user, overlaid on the word 14, to show that it has been selected. 

Figure 7(b) shows in magnified fornrv part of the text matter of Fig. 7(a), showing a selected word. To indicate 

20 selection, a selection block 20 is displayed overlaid on the word 1 4. (Here the block 20 is shown using hatching for the 
sake of illustration, but generally will comprise a solid black or coloured block, with the characters of the word 14 
appearing as white or "reversed out" ) The selection block 20 has vertical sides 22, 24 and horizontal sides 26, 28. 
The sides 22, 24 are positioned midway between the selected word 1 4 and the two adjacent words in the line — "analog" 
and "curve" respectively — and for this computations based on measured values of the inter-character separation (s c ) 

25 and the inter-word spacing (s^,) must be made, as described further below. 

In addition, the sides 26, 28 are positioned midway between the line containing the selected word 14 and the line 
of text above and below it, respectively. The sides 26, 28 are also skewed bye with respect to the horizontal dimension 
of the display, thereby providing appropriately-oriented selection feedback (block 20). 

Figure 8 is a flow chart showing the processing steps performed in providing the selection feedback illustrated in 

30 Fig 7 Initially (step s11), a user's double click of the left mouse button (i.e. first and second user inputs in rapid 
succession) is detected; and the displayed image is immediately frozen (step s12; although it will be appreciated that 
the freezing will occur upon the first of the "clicks" being made). 

An operation is then performed (step s1 3), using a small region near (typically below and to the right of) the initial 
cursor position 8 (at the first mouse click), to determine the angle 0 at which the text is skewed: this is described in 

35 detail above, with reference to Fig. 4. Then, a routine (s14) is performed to generate, for the small local region, an 
estimate of the inter-word spacing (threshold) (s w ) mjn (corresponding to (s c ) max — the threshold spacing above which 
the spacing must be an inter-word spacing rather than an inter-character spacing. A determination is then made (step 
s15), using known techniques, of the line separation within the small local region: this is the separation between the 
maximum height of characters on one line and the lowest level for characters on the line above it; and this enable the 

40 positions of the sides 26, 28 of the selection block 20 (Fig. 7(b)) to be determined. 

In step s1 6 this determination is made, together with a calculation of the positions of the sides 22, 24 of the selection 
block 20: side 22 is #(s w ) min to the left of the character "c" in the selected word 14 ("compression"), and side 24 is 1 /£ 
( s w)min to tne n 9 nt °* tne " n " in tne selected word 14. The selection block 20 with these sides is formed in step s17, 
and then in step s17 the selection block 20 is overlaid on the frozen image and the result displayed. 

45 it the user has finished selecting, the current image is frozen (step s8) in the display (window). Then, the image 

data for the image (here: the word 14) present within the selection block 20 is extracted (step si 9) from that for the 
image, and the extracted image is then rotated (step s20) through -0, so as to ready it for further processing, such as 
OCR. Figure 9 shows in more detail the technique (step s14 in Fig. 8) for determining the local inter-word threshold 
( s w)min Initially (step s141), for the local region and using techniques known in the art for computing connected com- 

50 ponents, the character-character separations are measured for each pair of adjacent characters. Here, the previously 
obtained skew information (G) is used to make the O'Gorman Docstrum techniques (Lawrence O'Gorman, "The Doc- 
ument Spectrum for Page Layout Analysis", in IEEE Transactions On PAMl, Vol 1 5, No. 1 1 , Nov 1 993) run faster. O'Gor- 
man used a connected component nearest neighbours method to find skew and inter-character and inter-line spacing. 
We use the skew information to find nearest neighbours in the line to give us inter-character information, and connected 

55 component heights to group blocks of consistent font size. 

A histogram is formed of horizontal gaps between the connected components of equal font size. This is ideally is 
a bimodal distribution (see Fig. 10(a)), i.e. with a first peak (mode) 36 corresponding to inter-character spacings (s c ), 
and a second peak (mode) 38 corresponding to inter-word spacings (s^,). Figure 10(b) shows a plot 40 of the real 
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measured value lor a typical sample, together with two curves sough, to be fitted to the plot ol the real data curve 40 
wheremT^!l Fi9 f" k'T iS by S,6pS ,0r findin 9 ,he best «■ usi "9 2k values about the value i-m 



m+k 

u. = 1/(2k+1). Z h(i) 



(1a) 
i=m-k 



o k 2 = 1/(2k+1). 2 (h(i). 



(1 b) 
i=m-k 



The proportion of data in (2k+1) values around the mode is given by: 
m+k 

p. = {E h(i>} / { s h(j)) 

i=m-k |=o (1C) 

t~i h th 6 ^ Sl 1 45 ', ar J !T a ' ySiS iS m3de °' Wh6,her ,he Curve is a 9°° d fi,: this is *»e using the well known Chi-sauared 

T° PaVM * ,Pa "" fliS ' T *™- J • • Pa « 8 S«9~>B«~ M aZSSZ ■ CVGIP ££££ 
Models and Image Process nq, Vo\ 54 No 6 Nov 199?™ 4fl4 4Qfi\hocoH rt ^ , T . ovej| n graphical 

rSiSS? ,hird m ° USe C ' iCk ' °' Wh ° le S8n,enCe C ° mainin 9 ' he ^selected by .h ?dS?2 T 

ss:::r doubie c,,ck becomes an in,ermedia,e user ^ - ^ *** «** - nssssi 

Furthermore, it will be appreciated that, through simple modification of the techniques of Figs 8-10 and usino the 
tesoZHl r! °* 01 de,erminhl 9 ,ex « < colu ™> techniques may be provided for fndLSg selecZ "n 

HL1^b£ 0 t£S \ T ° n ' here " 3 ' irSt US6r ' npUt a ' ,he firs < mouse click (^th the eft 
w, a „S m r 9 X an ' n,,ni,e nUmbGr ° f in,G "™diate "user inputs" as the cursor is dragged across he 

text, and the final user ,npu« defining the end of selection when the left mouse button is released This I niusSd in 
Fig. 1 1 (the column hmits are omitted for the sake of clarity/illustration) '"ustrated in 

beains 3 rh!h e , uL ,l °7 * ^ P^'" 9 ,h6 SeleC,IOn ,eedback ^own in Fig. 11 The process 

JS Z Tl , k m ° USe bU, '° n C " Ck ,0 S ' art Se,eC,i ° n The skew an 9' e is determined as described hereSove 
and then the column boundar.es derived using techniques based on the abovementioned work of PavHdis 
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Next, a small vertical portion of the column is segmented. With knowledge of skew angle and column boundaries, 
this step could simply be to segment the whole column to locate the position of each of the words. However, word 
segmentation tends to be a slow operation, so instead, we divide the column into small horizontal strips and segment 
each of these separately. This segmentation process operates in a separate thread of program execution, thus allowing 

s the user to freely continue moving the cursor, and for the system to update the selection display. This leads to a relatively 
fast interactive display, whilst at the same time, allowing anything from a single word, to a whole column to be selected. 

As shown on Fig. 11, the selection block 30 thus formed is constituted by an upper block (covering the selected 
text on one line) extending from the left side 22' to a side coincident with the column boundary and having a lower side 
32, and a lower block (covering the selected text on an adjacent line) extending from the side 24' and having an upper 

10 side 24. It will be appreciated that where the selection extends over 3 or more lines then the selection block also include 
an intermediate block, between the upper and lower ones, which extends between the column boundaries. 

Copy to Clipboard 

15 Once the user has selected a region, the next step is to copy it to the Windows clipboard. As previously described, 

this can be done in a number of different ways. The operations that are performed on the selected region prior to it 
being copied depend not only on the way in which it is copied, but also on the way in which it was selected; Table 1 
highlights the operations necessary. 



20 Table 1 





Operations necessary for copying a selected region 




Selection method: 


Rectangular selection box 


Skewed selection box 


Word-to-word selection 




Copy as: 








25 


Text 


Copy region 


Copy region 


Copy region 






Binarise 


Rotate 


Rotate 






OCR 


Binarise 
OCR 


Mask unwanted text 
Binarise 


30 








OCR 




Color image 


Copy region 


Copy region 
Rotate 


Copy region 
Rotate 

Mask unwanted text 


35 


Grey-scale Image 


Copy region 


Copy region 


Copy region 






Convert color to grey-scale 


Rotate 

Convert color to grey-scale 


Rotate 

Mask unwanted text 
Convert color to grey-scale 


40 


Binary image 


Copy region 


Copy region 


Copy region 




Binarise 


Rotate 
Binarise 


Rotate 

Mask unwanted text 
Binarise 



45 For example : if the region was selected using the skewed selection box and a color image is required, we first 

make a local copy of the selected region, de-skew by rotating it through the skew angle, and then place it on the 
clipboard. A more complex example is copying as text following a word-to-word selection. In this case, it is also nec- 
essary to mask out unwanted text from the beginning and end of the first and last lines of text. This is followed by 
converting the color image to a black and white image (binarisation), which is then passed to the OCR engine. Finally, 

so the text returned by the OCR engine is then placed on the clipboard. 

Of all these operations, one of the most important is the binarisation stage, particularly when followed by OCR. 
Due to the low resolution of the camera images, coupled with possible lighting variations, unacceptable results will be 
obtained if the camera image isbinarised using a simple threshold algorithm. Therefore, the image enhancement and 
thresholding techniques of British patent application 9711024.1 (ref. R/97008) are suitably used. 

55 It will be further appreciated that a function may be provided (selectable by the user using a button on a toolbar 

of the Ul) for when the user is invoking the 'Copy as text' function, enabling the line breaks to be removed from the 
OCRed text. This is useful, for example, when the text is to be pasted into a word processor. Furthermore, another 
such toolbar button may provide the option of the user viewing the selection they have copied in a preview window, in 
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a manner similar to the clipboard viewer on a conventional PC. 

Claims 

1 ' h cZ£XZTj££Z lma9e PrOC6SSin9 SyS,em ^ ima96S °» dOCUme "' S - e C ^- d * an image 

5) i r "eTpon5V!o S !h U e S fi r r ^ PU, de " ni ? 9 , S ' art °' 3 Se,eC,i ° n and 3 f ' rS ' pOSI,ion wrthin ,he *P"*- -age, 
(cj in response to tne first user input, freezing the displayed image 

!e! SeSn I" 6 f 6W fl* <0) ° f ,6Xt ma,ter Wi ' h reSpect to ,he field of view °' ^ image capture device 
(e) rece,v,ng at least one further user input, inc.uding a final user input defining the end of a selection and 

for the or each further user input, 

i?emtn r «7n?eLn S d2n °" a ? 9 ' 6 de * ermined ^ ^ P ° Siti ° n ' Shape and ^menstons of a selection 
element in dependence upon at least said first position, and 

(g) displaying the selection element superimposed on said frozen displayed image. 
* -l^^lo^r S,6P COmPnS6S de,6rminin9 Sk6W a " 9le " 3 *- P — °< - -«or at 

3. The method of claim 1 or 2, wherein said final user input defines a second position within the displayed imaco 
Pos,,i^ 

4. The method of claim 1 or 2. wherein the selection comprises the selection of one or more words of said text matter 
w*h,n sa.d delayed ,mage, and said selection element comprises a selection block o^^^TcZlZ 

5. The method of claim 4, wherein step (f) comprises the substeps of: 

SJ^Sn^^lS^ (Sw)m - ' r0m meaSUfed Va ' UeS °' SeParat '° n b6,Ween «*— PalrS 
(f2) determining the dimensions of the selection block in the direction of flow of said text matter as a function 
of the word separation value (s w ) min determined in step (f 1 ). UnC, '° n 

6. The method of claim 5, wherein step (f 1 ) comprises the substeps of: 

VlT nQ 3 POrti ° n °' S9id ' eXt ma,,ef ' P re,erab| y a « or adjacent said first position, forming a histogram of 
frequency versus ,nter-charac,er spacing for each pair of adjacent characters within said portion 

f i, na U cr 7 My °l dmerent GaUSS ' an CUrV6S ' d6,ermin "9 which — e is * best Sng cTn,e said best- 
m 9 3 Wi,h 3 predele ™ ned oi the histogram termed in step (f i i) and 

ffisrsc^^ ■»*» - °< - ~ - « - 

The method of Cairn 6, wherein the estimate point corresponds to the value (Sw)min = Mk=+3 o k , where: 



m+k 

m= 1/(2k+1). E h(i) 
55 i=m-k 



and 
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m+k 

o R 2 = 1/(2k+l). Z {h(i)-uj 2 . 

i=m-k 

5 

8. The method of any of claims 4 to 7, wherein step (f) further comprises: 

(f3) determining the line spacing (s,) between adjacent lines of said text matter, and 
to (f4) determining the dimensions of the selection block in the direction perpendicular to the flow of said text 

matter as a function of said line spacing (s,); and/or wherein step (f) further comprises: 
(f5) determining the horizontal (column) limits of the text matter, and 

(f6) determining whether said first and second positions are on different lines of said text matter. 

is 9. The method of claim 9, wherein, if the determination in step (f6) is positive, step (f2) further comprises: 

(f2i) for an upper portion of the selection block, overlaying text matter between said first position and the right 
hand horizontal (column) limit of the text matter, and 

(f2ii) for a lower portion ol the selection block, overlaying text matter between said second position and the 
20 left hand horizontal (column) limit of the text matter); and/or wherein step (f2) further comprises: 

(f2iii) where the line of text containing said first position is separated by one or more further lines from the line 
of text containing said second position, for an internal portion of said selection block overlaying said one or 
more further lines, using as the left hand and right hand sides of said internal portion the left hand and right 
hand, respectively, horizontal (column) limits of said text matter. 

25 

10. A programmable image processing system when suitably programmed for carrying out the method of any of the 
preceding claims, the system including a processor, and a memory, an image capture device, an image display 
device and a user input device, the processor being coupled to the memory, image capture device, image display 
device and user input device, and being operable in conjunction therewith for executing instructions corresponding 
30 to the steps of said method(s). 
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